Current Issue : October - December Volume : 2012 Issue Number : 4 Articles : 5 Articles
Clustering algorithms have been used to improve the speed and quality of placement. Traditionally, clustering focuses on the local\r\nconnections between cells. In this paper, a new clustering algorithm that is based on the estimated lengths of circuit interconnects\r\nand the connectivity is proposed. In the proposed algorithm, first an a priori length estimation technique is used to estimate the\r\nlengths of nets. Then, the estimated lengths are used in a clustering framework to modify a clustering technique based on algebraic\r\nmultigrid (AMG), that finds the cells with the highest connectivity. Finally, based on the results from the AMG-based process,\r\nclusters are made. In addition, a new physical unclustering technique is proposed. The results show a significant improvement,\r\nreductions of up to 40%, in wire length can be achieved when using the proposed technique with three academic placers on\r\nindustry-based circuits. Moreover, the runtime is not significantly degraded and can even be improved....
The optimization process of a H.264/AVC encoder on three different architectures is presented. The architectures are multiand\r\nsinglecore and SIMD instruction sets have different vector registers size. The need of code optimization is fundamental\r\nwhen addressing HD resolutions with real-time constraints. The encoder is subdivided in functional modules in order to better\r\nunderstand where the optimization is a key factor and to evaluate in details the performance improvement. Common issues in both\r\npartitioning a video encoder into parallel architectures and SIMD optimization are described, and author solutions are presented\r\nfor all the architectures. Besides showing efficient video encoder implementations, one of the main purposes of this paper is to\r\ndiscuss how the characteristics of different architectures and different set of SIMD instructions can impact on the target application\r\nperformance. Results about the achieved speedup are provided in order to compare the different implementations and evaluate\r\nthe more suitable solutions for present and next generation video-coding algorithms....
The increasing demand for the high fidelity portable devices has laid emphasis on the development of low power and high performance systems. In the next generation processors, the low power design has to be incorporated into fundamental computation units, such as adder. Adder plays an important role in arithmetic operation such as addition, subtraction, multiplication, division etc. The characterization and optimization of such low power adder will aid in comparison and choice of adder modules in system design. In this paper we performed a comparative analysis of the power, delay, and power delay product (PDP) optimization characteristic.. This paper deals with the design of some adder using transistors and simulations is done with DSCH 3.1 and Microwind3.1 CAD tool. 10 transistor adder circuit shows the least power consumption among others....
The paper presents a unified hybrid architecture to compute the 8 Ã?â?? 8 integer inverse discrete cosine transform (IDCT) of\r\nmultiple modern video codecsââ?¬â?AVS, H.264/AVC, VC-1, and HEVC (under development). Based on the symmetric structure\r\nof the matrices and the similarity in matrix operation, we develop a generalized ââ?¬Å?decompose and shareââ?¬Â algorithm to compute the\r\n8 Ã?â?? 8 IDCT. The algorithm is later applied to four video standards. The hardware-share approach ensures the maximum circuit\r\nreuse during the computation. The architecture is designed with only adders and shifters to reduce the hardware cost significantly.\r\nThe design is implemented on FPGA and later synthesized in CMOS 0.18um technology. The results meet the requirements of\r\nadvanced video coding applications...
The canonical signed digit (CSD) representation of constant coefficients is a unique signed data representation containing the\r\nfewest number of nonzero bits. Consequently, for constant multipliers, the number of additions and subtractions is minimized\r\nby CSD representation of constant coefficients. This technique is mainly used for finite impulse response (FIR) filter by reducing\r\nthe number of partial products. In this paper, we use CSD with a novel common subexpression elimination (CSE) scheme on\r\nthe optimal Loeffler algorithm for the computation of discrete cosine transform (DCT). To meet the challenges of low-power and\r\nhigh-speed processing, we present an optimized image compression scheme based on two-dimensional DCT. Finally, a novel and\r\na simple reconfigurable quantization method combined with DCT computation is presented to effectively save the computational\r\ncomplexity. We present here a new DCT architecture based on the proposed technique. From the experimental results obtained\r\nfrom the FPGA prototype we find that the proposed design has several advantages in terms of power reduction, speed performance,\r\nand saving of silicon area along with PSNR improvement over the existing designs as well as the Xilinx core....
Loading....